Tagging Portuguese With A Spanish Tagger

نویسندگان

  • Jirka Hana
  • Anna Feldman
  • Luiz Amaral
  • Chris Brew
چکیده

We describe a knowledge and resource light system for an automatic morphological analysis and tagging of Brazilian Portuguese.1 We avoid the use of labor intensive resources; particularly, large annotated corpora and lexicons. Instead, we use (i) an annotated corpus of Peninsular Spanish, a language related to Portuguese, (ii) an unannotated corpus of Portuguese, (iii) a description of Portuguese morphology on the level of a basic grammar book. We extend the similar work that we have done (Hana et al., 2004; Feldman et al., 2006) by proposing an alternative algorithm for cognate transfer that effectively projects the Spanish emission probabilities into Portuguese. Our experiments use minimal new human effort and show 21% error reduction over even emissions on a fine-grained tagset.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Tagging Portuguese with a Spanish Tagger Using Cognates

We describe a knowledge and resource light system for an automatic morphological analysis and tagging of Brazilian Portuguese.1 We avoid the use of labor intensive resources; particularly, large annotated corpora and lexicons. Instead, we use (i) an annotated corpus of Peninsular Spanish, a language related to Portuguese, (ii) an unannotated corpus of Portuguese, (iii) a description of Portugue...

متن کامل

Mining for unambiguous instances to adapt part-of-speech taggers to new domains

We present a simple, yet effective approach to adapt part-of-speech (POS) taggers to new domains. Our approach only requires a dictionary and large amounts of unlabeled target data. The idea is to use the dictionary to mine the unlabeled target data for unambiguous word sequences, thus effectively collecting labeled target data. We add the mined instances to available labeled newswire data to t...

متن کامل

WANN-Tagger - A Weightless Artificial Neural Network Tagger for the Portuguese Language

Weightless Artificial Neural Networks have proved to be a promising paradigm for classification tasks. This work introduces the WANN-Tagger, which makes use of weightless artificial neural networks for labelling Portuguese sentences, tagging each of its terms with its respective part-of-speech. A first experimental evaluation using the CETENFolha corpus indicates the usefulness of this paradigm...

متن کامل

Part-of-Speech Tagging for English-Spanish Code-Switched Text

Code-switching is an interesting linguistic phenomenon commonly observed in highly bilingual communities. It consists of mixing languages in the same conversational event. This paper presents results on Part-of-Speech tagging Spanish-English code-switched discourse. We explore different approaches to exploit existing resources for both languages that range from simple heuristics, to language id...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006